2 research outputs found

    Recognition of English vowels using top-down method

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 69-70).Many recognizers use bottom-up methods for recognizing each phoneme or feature, and use the cues and the context to find the most appropriate words or sentences. But humans recognize words not just through bottom-up processing, but also top-down. In many cases of listening, one can usually predict what will come based on the preceding context, or one can determine what has been pronounced by listening to the following sounds. Therefore, if some cues to a word are given, it would be possible to refine the recognition by using the top-down method. This thesis deals with the improvement of the performance of recognition by using the top-down method. And most of the work will be concentrated on the problem of vowel recognition, when the adjacent consonants are known.by Park Chi-youn.S.M

    Consonant landmark detection for speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 191-197).This thesis focuses on the detection of abrupt acoustic discontinuities in the speech signal, which constitute landmarks for consonant sounds. Because a large amount of phonetic information is concentrated near acoustic discontinuities, more focused speech analysis and recognition can be performed based on the landmarks. Three types of consonant landmarks are defined according to its characteristics -- glottal vibration, turbulence noise, and sonorant consonant -- so that the appropriate analysis method for each landmark point can be determined. A probabilistic knowledge-based algorithm is developed in three steps. First, landmark candidates are detected and their landmark types are classified based on changes in spectral amplitude. Next, a bigram model describing the physiologically-feasible sequences of consonant landmarks is proposed, so that the most likely landmark sequence among the candidates can be found. Finally, it has been observed that certain landmarks are ambiguous in certain sets of phonetic and prosodic contexts, while they can be reliably detected in other contexts. A method to represent the regions where the landmarks are reliably detected versus where they are ambiguous is presented. On TIMIT test set, 91% of all the consonant landmarks and 95% of obstruent landmarks are located as landmark candidates. The bigram-based process for determining the most likely landmark sequences yields 12% deletion and substitution rates and a 15% insertion rate. An alternative representation that distinguishes reliable and ambiguous regions can detect 92% of the landmarks and 40% of the landmarks are judged to be reliable. The deletion rate within reliable regions is as low as 5%.(cont.) The resulting landmark sequences form a basis for a knowledge-based speech recognition system since the landmarks imply broad phonetic classes of the speech signal and indicate the points of focus for estimating detailed phonetic information. In addition, because the reliable regions generally correspond to lexical stresses and word boundaries, it is expected that the landmarks can guide the focus of attention not only at the phoneme-level, but at the phrase-level as well.by Chiyoun Park.Ph.D
    corecore